Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable TableRow converted from BQ types #5536

Merged
merged 7 commits into from
Jan 10, 2025
Merged

Conversation

RustedBones
Copy link
Contributor

@RustedBones RustedBones commented Dec 19, 2024

Coder[TableRow] is destructive (it is a dummy JSON serializer), we should make sure that the TableRow object converted from a BQ model is stable after serialization.

We currently have an issue with

  • long that are serialized as string to avoid overflow
  • float that are read back as double
  • json that is read as nested TableRow

As side effect, I needed to avoid toString conversion when converting back a BQ typed model from a TableRow

val provider: OverrideTypeProvider =
OverrideTypeProviderFinder.getProvider
val s = q"$tree.toString"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were forcing toString before converting back to desired type

@RustedBones RustedBones force-pushed the bq-typed-stable-table-row branch from fdd4fbb to d8d30c8 Compare December 19, 2024 15:50
Comment on lines 56 to 58
// f is a field from TableRow.
// Jackson ObjectMapper will fail with such key
key <- Gen.alphaStr.retryUntil(_ != "f")
Copy link
Contributor Author

@RustedBones RustedBones Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what happens if a BQ table has a field named f. It's probable that we can't use the TableRow API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick test. Reading a column named 'f' fails to create a TableRow item here.

See apache/beam#33531

@RustedBones RustedBones force-pushed the bq-typed-stable-table-row branch from d8d30c8 to d76d226 Compare December 19, 2024 16:12
Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 84.95575% with 17 lines in your changes missing coverage. Please review.

Project coverage is 61.85%. Comparing base (d4b1ced) to head (101e852).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
.../spotify/scio/bigquery/syntax/TableRowSyntax.scala 84.68% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5536      +/-   ##
==========================================
+ Coverage   61.44%   61.85%   +0.41%     
==========================================
  Files         312      312              
  Lines       11105    11207     +102     
  Branches      776      791      +15     
==========================================
+ Hits         6823     6932     +109     
+ Misses       4282     4275       -7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor Author

@RustedBones RustedBones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally all the TableRow -> scala cast logic should be shared with TableRowOps

@RustedBones RustedBones force-pushed the bq-typed-stable-table-row branch from a820b88 to f92e641 Compare January 8, 2025 09:02
Coder[TableRow] is destructive (it is a dummy JSON serializer), we
should make sure that the TableRow object converted from a BQ model is
stable after serialization.

We currently have an issue with
- long that are serialized as string to avoid overflow
- float that are read back as double
- json that is read as nested TableRow
@RustedBones RustedBones force-pushed the bq-typed-stable-table-row branch from 7694e15 to c869876 Compare January 8, 2025 18:26
row.getRecord("record") shouldBe expected
}

it should "#3378: not throw an NPE on a non-existent subrecord" in {
Copy link
Contributor Author

@RustedBones RustedBones Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very logical.
Added a getRecordOpt to be able to get Option[TableRow] instead of returning null

private lazy val mapper = new ObjectMapper()
.registerModule(new JavaTimeModule())
.registerModule(new JodaModule())
.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this needed for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the config from TableRowJsonCoder.

It forces serialization of dates as string

Feature that determines whether Date (and date/ time) values (and Date-based things like java. util. Calendars) are to be serialized as numeric time stamps (true; the default), or as something else (usually textual representation).

@RustedBones RustedBones merged commit b136e11 into main Jan 10, 2025
12 checks passed
@RustedBones RustedBones deleted the bq-typed-stable-table-row branch January 10, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants